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[57] ABSTRACT 

A process for performing image frame fusion on a pixel- 
by-pixel basis by estimating velocities and occlusions 
between at least two image frames of a sequence of frames. 
For each pixel, the possible matchings are those that mini- 
mize changes in a selected parameter of the image (generally 
the grey-level). The process uses a region-growing proce- 
dure to reduce the number of possible matchings for each 
pixel. The output is a decomposition of the images into 
regions in which pixels move with the same model of 
velocity or are all occluded. The process includes a multi- 
scale description of velocities between two images, a multi- 
scale segmentation of the images into regions having dif- 
ferent motion, a correcting term for sampling errors, sub- 
pixel motion errors and an occlusion estimation step. A 
preferred embodiment employs pyramidal calculations. 
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IMAGE FRAME FUSION BY VELOCITY 
ESTIMATION USING REGION MERGING 

BACKGROUND OF THE INVENTION 

1. Technical Field 

The present invention relates to image processing and 
enhancement by fusion of plural image frames. The inven- 
tion performs frame fusion on a pixel-by-pixel basis by 
estimating velocities and occlusions between two frames. 
For each pixel, the possible matchings are those that mini- 
mize changes in a selected parameter of the image (generally 
the grey-level). 

2. Background Art 

Finding velocities in a sequence of images requires fol- 
lowing points along their motion in the image. That is, when 
not occluded, the pixels of the current image are associated 
with the pixels of the next image. This association is based 
on the relative constancy of a given quantity estimated from 
the images at each pixel In general, this quantity is the grey 
level value of the pixel, since it does not present large 
variation during motion. But, it might be defined on other 
measurements such as the curvature, the gradient and so 
forth. Given a velocity, one can define a measure that can say 
whether or not this velocity is accurate. We call this measure 
the "error". The error is based on the variation of the 
quantity along the motion defined by the velocity. Possible 
velocities will have small errors attached. Estimating the 
velocities in this case consists of finding velocities that have 
small errors. Unfortunately, this property is not enough to 
define a unique velocity field. Indeed, there might exist in 
the next frame many points having the same grey level (or 
other selected quantity) as those of a given point of the 
current frame. This is the well-known aperture problem, 
which must be solved in order to find the velocities. The 
probability of matching plural points in the image with the 
same velocities decreases by the number of points. Many 
techniques try to exploit this observation. For example, the 
well-known correlation technique tries to match by neigh- 
borhood (generally defined by a square). But, this arbitrary 
neighborhood might be too large and therefore mix points 
having different velocities, or conversely too small to solve 
the aperture problem. The neighborhood around each point 
should be composed of only the points that move with same 
velocity, which set of points shall be referred to in this 
specification as a "region". The problem is then that such 
"regions" are usually defined by velocities while being 
relied upon to provide an estimate of these same velocities. 

A scene or image can include moving objects. Recovering 
the velocities requires performing a partitioning of the scene 
into objects (regions) and attributing to each region a model 
of velocity. The following sub-problems are easy to solve: 
(a) Given the velocities find the regions; and (b) Given the 
regions find the velocities. Unfortunately, in order to solve 
the entire problem exactly, one has to rind regions and 
velocities simultaneously. Conventional approaches are 
based on the sequential use of techniques which solve one of 
the sub -problems stated above. The dominant motion 
approach involves processing a sequential estimation of the 
dominant motion, and the extraction of the attached region. 
Therefore this approach uses techniques that solve the first 
sub -problem on velocities that are obtained based upon the 
assumption of a dominant motion. A technique disclosed in 
Bouthemy et al., "Motion segmentation and qualitative 
dynamic scene analysis from an image sequence", The 
Internationai Journal of Computer Vision Vol. 10, No. 2, 
pages 157-182, April 1993, employs sequential use of 
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techniques which solve, alternately, the first and then the 
second sub -problem. This sequence of processes is not 
proved to converge, and requires a good initialization of 
both region and velocity estimates. A technique disclosed in 

5 Schweitzer, "Occam algorithms for computing visual 
motion", IEEE Transactions on Pattern Analysis and 
Machine Intelligence Vol. 17, No. 11, pages 1033-1042 
(1995) employs a similar sequential process, but uses a 
splitting algorithm where regions are rectangles. This latter 

10 technique is sure to converge, but suffers from the over- 
simplification of the description of a region as a rectangle. 
Another disadvantage is that the initialization as one region 
for the entire picture might lead to a fixed point far from the 
solution. The aim of this latter technique is not necessarily 

15 to find the "good" velocities, but to find the best mapping in 
terms of compression. The problems of these techniques is 
that when solving the first sub-problem, they try to find 
velocities from unknown regions (and therefore possibly 
erroneous), and when solving the second sub-problem they 

20 try to find regions from unknown velocities. 

Many techniques dealing with the problem of finding a 
unique global motion of a scene have been developed 
successfully. Even if all of these techniques can not be 
applied in general to recover multiple motions, some 

25 attempts have been proposed in some particular cases. The 
most significant example is the technique of publication of 
Bouthemy et al. referred to above. The hypothesis of a 
dominant image motion proposed in Clou tier et al., "Seg- 
mentation and estimation of image motion by a robust 

30 method", Proc. IEEE pages 805-809 (1995), assumes that 
the observed scene is made from moving objects having 
very different sizes (for example a little object and a large 
background.) A least median of squares estimators based on 
optical flow constraints is performed on the entire image to 

35 extract the model of the dominant motion. Then, the first 
subproblem is solved according to the knowledge of the 
dominant velocity: the region corresponding to the dominant 
motion is found. Once this dominant object has been 
detected, it is removed from the region of analysis, and the 

40 same process is repeated on the remaining part of the image. 
Two limitations on the use of this technique are: first, the 
underlying hypothesis is in general too restrictive for a real 
sequence, and, secondly, the link between dominant motion 
and dominant object must be investigated. Indeed, once the 

45 dominant motion has been computed, one has to decide for 
each point whether or not it moves according to the domi- 
nant motion and therefore whether or not it belongs to the 
dominant object. This decision is made by local estimates 
around each pixel, and by an a priori thresholding, and 

50 therefore is very sensitive to noise. 
Bouthemy et al/s Motion Segmentation 

Bouthemy et al. assume in their publication cited above 
that they initially have a segmentation of the velocities (for 
example obtained by dominant motion approach), and they 

55 propose a technique to improve its quality. They start their 
algorithm with the segmentation R ( , V ( ., where V f is the 
velocity model associated to the region R,. Then, they make 
the boundary of the region move in order to decrease an 
energy which balances the matching error with the length of 

60 the boundaries. They recompute the velocity within the 
region when a significant change of shape of the region 
occurs. The initial velocity is used for initialization of the 
new estimation. Their algorithm suffers many problems. 
First, the initial segmentation has to be near the solution. 

65 Therefore their algorithm has to be seen as a way to improve 
the quality of velocity estimate rather than an algorithm that 
calculates the velocity. Secondly, the algorithm is not proved 
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to converge. Moreover, it is very sensitive to local extreme. 
Thirdly, it attributes one (and only one) velocity to each 
region, and the segmentation of the region is based on these 
velocities. It is in a sense a segmentation from velocities 
estimation, whereas it should be velocity estimate from a 
segmentation. Finally, the occlusions are not taken into 
account. 

Schweitzer's Algorithm. 

The publication by Schweitzer cited above formulates the 
problem of motion estimation as a search for a function that 
can accurately predict frames. It balances the velocity field 
based upon determinations of (a) how good the prediction is 
and (b) how simple it is. The first requirement is measured 
as usual by the error terms. The simplicity of the vector field 
is set by Schweitzer in terms of encoding length. His 
algorithm is based on a segmentation procedure by splitting 
rectangles. Each rectangle is split horizontally or vertically 
into two other rectangles if the splitting increases the quality 
of the prediction more than a cost based on the increase of 
the complexity (appearance of a new rectangular region). 
Unfortunately, given a rectangle, the location of the split or 
boundary is problematic. In the algorithm of Schweitzer, one 
needs estimates of the velocities for each point in the 
rectangles. And, the segmentation depends on the pre- 
calculated velocities. Finally, the rectangle-based segmen- 
tation might not be sufficient to take into account non- 
rectangular objects. 

Morel et al/s Grey-Scale Segmentation of Images 

A gray-scale segmentation technique disclosed in Morel 
et al, "Variational methods in image segmentation", in H. 
Brezis, editor, Progress in Nonlinear Differential Equations 
and Their Applications, Birkhauser, 1995 which produces a 
piece-wise constant image that approximates the original 
image. The approximation is scaled: the larger the scale, the 
bigger the regions (the pieces of the segmentation). They 
propose to balance the quality of the approximation (which 
is measured by the grey-level difference between the origi- 
nal image and its approximation) by the complexity of the 
approximation (measured by the total length of the 
boundaries). They initialize the process by considering each 
pixel as a region. Then they merge regions if the merging 
decreases the following energy: 

£-/(«(jf)-« - (r)) 2 +ALciigai(J?J 

where u 0 denotes the original image, u its piece -wise con- 
stant approximation, B u the boundaries of the regions of u, 
and X a scale parameter. The algorithm ends when merging 
is no longer possible. Of course Morel et al/s algorithm for 
segmenting grey-scale images does not give any information 
about velocities. 

SUMMARY OF THE DISCLOSURE 

The invention is embodied in a process for obtaining 
information from at least two image frames of a sequence of 
frames, each of the frames including an array of pixels, each 
pixel having an amplitude, one of the two frames being 
designated as a reference frame and the other being a 
non-reference frame, the process including: 

(1) defining a set of velocities with which the motion of 
pixels between the two frames may be modeled; 

dividing each one of the two frames into plural regions; 

(2) determining an error for each one of at least some of 
the velocities by carrying out the following steps for 
each one of the regions and for each union of pairs of 
the regions: 

(A) mapping each pixel of the non-reference frame into 
the reference frame in accordance with the one 
velocity, 
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(B) computing an error amount which is a function of 
a difference in pixel amplitude attributable to the 
mapping; 

(C) designating a minimum one of the error amounts 
computed for the velocities as the error for the one 
velocity, whereby a respective error is associated 
with each of the regions and with each union of pairs 
of the regions without regard to velocity; and 

(3) merging qualified ones of the regions by the following 
steps: 

(A) computing for each pair of regions a merging scale 
which depends upon a gain including a function of 

(a) the sum of the errors of each pair of regions and 

(b) the error of the union of the pair of regions; 

(B) merging each pair of the regions for which the . 
merging scale meets a predetermined criteria. 

The merging scale preferably depends also upon a cost 
including a function of (a) the sum of the lengths of the 
boundaries of each pair of regions and (b) the length of the 
boundary of the union of the pair of regions. 

The. step of determining an error for each one of at least 
some of the velocities can include determining an error for 
each one of all of the velocities. 

The process can further include, after the step of merging: 

erasing the individual pairs of regions which have been 
merged and defining their unions as individual regions; 
and 

repeating the steps of (a) computing an error for each one 
of at least some of the velocities, (b) computing a 
merging scale and merging each pair of regions for 
which the merging scale meets the criteria, whereby the 
process includes plural repetitive iterations. 
Preferably the step of determining an error for each one of 
at least some of the velocities includes determining the error 
for a limited set of the velocities, the limited set of the 
velocities corresponding to those velocities associated with 
the N smallest errors computed during a prior iteration of the 
process, wherein N is an integer. 

If each limited set of velocites associated with the N 
smallest errors is different for different regions, then the step 
of determining an error includes: 
designating as the maximum error for a given region the 
largest error computed for that region in any prior 
iteration of the process; 
and the step of computing the merging scale includes 
determining for each pair of regions whether a velocity 
included in the limited velocity set of one of the regions 
is not included in the limited velocity set of the other of 
the pair of regions, and assigning as the corresponding 
error for the other region the maximum error. 
The mapping includes computing a new pixel amplitude 
in accordance with a weighted average of pixel amplitudes 
mapped into the reference frame, wherein the weight of each 
pixel amplitude mapped into the reference frame is a 
decreasing function of the mapped pixel's distance from a 
given pixel location in the reference frame. 

The mapping step of mapping pixels from the non- 
reference frame to the reference frame is a forward mapping, 
and the process can further include determining which ones 
of the pixels are occluded by carrying out the following 
steps: 

(I) determining which pixels were not matched from the 
non-reference frame to the reference frame by the 
merging step following the forward mapping step and 
removing the pixels not matched from the reference 
frame; 
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(II) performing the step of determining an error and the FIG. 4 is a block flow diagram of an update process 
step of merging, except that the mapping step includes employed in the process of FIG. 2, 
a backward mapping of mapping from the reference FIG. 5 is a block flow diagram of a computation of the 
frame to the non-reference frame, the backward map- merging scale employed in the process of FIG. 4. 
ping step employing a version of the reference frame in 5 FIG. 6 is a block flow diagram of an occlusion estimation 
which the pixels not matched have been removed; process employed in the process of FIG. 1. 

(III) determining which pixels were not matched from the FIG. 7 is a schematic block diagram illustrating the data 
reference frame to the non-reference frame by the organization of a frame fusion process employed in carrying 
merging step following the backward mapping step and out the invention. 

removing the pixels not matched from the non- 10 FIG. 8 is a block flow diagram of the frame fusion 

reference frame; process. 

(IV) comparing the pixels remaining in the reference FIG. 9 is a block flow diagram of a data organization step 
frame with the pixels remaining in the non-reference carried out in the frame fusion process of FIG. 8. 
frame, and repeating steps I, II and III if there is a DETAILED DESCRIPTION OF THE 
difference beyond a predetermined threshold. 15 INVENTION 

The process assigns a velocity to each remaining pixel in Introduction: 

the non-reference frame and then adds the remaining pixels The present invention enhances an image by the fusion of 

of the non-reference frame to the reference frame in accor- plural image frames. The invention performs frame-fusion! 

dance with the velocity assigned to each pixel of the on a pixel-by-pixel basis by estimating velocities and.occlu- 

non-reference frame to produce an enhanced frame. 20 sions -between two frames. For each pixel, the possible 

The process can further include deblurring the image of matchings are those that minimize changes in a selected 

the enhanced frame to produce a super frame. parameter of the image (generally the grey-level). The 

The dividing step can initialize the regions so that each invention uses a region-growing procedure to reduce the 

pixel is an individual region. number of possible matchings for each pixel. Its output is a 

The model velocities include at least one of: the set of 25 decomposition of the images into regions in which pixels 

translational velocities, the set of rotational velocities, or the move with the same model of velocity or are all occluded..- 

set of zooms. The technique includes a multi-scale description of veloci- 

Preferably, the unions of pairs of regions constitute unions ties between two images, a multi-scale segmentation of the 

of pairs of adjacent regions only. images into regions having different motion, a correcting 

The merging scale is computed as a ratio obtained by 30 term for sampling errors and sub-pixel motion errors, and an 

dividing the cost by the gain, and wherein the predetermined occlusion estimation step. A preferred embodiment employs j\ 

criteria includes a maximum scalar value of the ratio above pyramidal calculations. r ^ ^ 

which merging is disallowed. The scalar value is selected in The problem of defining regions and of estimating veloci- 

a range between an upper limit at which the entire image is ties is solved in the present invention by firsFdefinirif the 

merged and a lower limit at which no pixels are merged. 35 regions without any estimate of the velocities. The invention 

The process further includes defining the set of velocities attaches one velocity-model to each region 1 . All points in a 

as a simple set during the first one of the iterations of the region do not necessarily move with the same velocity, 

process, and supplementing the set of velocities with addi- because the region might be in rotation, and then all the 

tional velocities as the size of the regions grows. Preferably, points of this region have different velocities, but they at 

the simple set includes the set of translational velocities, and 40 least all belong to one "model" of velocity which is the given 

the additional velocities include the set of rotational veloci- rotation. In one example, all^theTpoints of a.region can share 

ties and the set of zoom velocities. the same model of velocity, but this model of velocity can 

The reference and non-reference frames can fie in a be defined with different velocities for different points of the 

moving sequence of frames depicting an image having region. There is no limitation in the invention in the possible 

motion, the process further including designating one of the 45 models of velocity that can be observed within an object in 

sequence of frames as the reference frame and successively a real sequence of images. The only restriction is the 

designating others of the sequency of frames as the non- smoothness that should satisfy a velocity field within a 

reference frame, and performing all of the foregoing steps object. In one embodiment, the invention approximates the 

for each one of the successive designations of the non- entire set of possible models by a limited set of models that 

reference frame, whereby the superframe contains informa- 50 correspond to elementary motions. The limited set may be 

lion from all the frames of the sequence. Furthermore, the refined in a progressive sequence of approximation, starting 

process can further include designating successive ones of with, for example, the 7 set of translations (which corresponds 

the sequence of frames as the reference frame and repeating to a kind of Oth ofder approximation), then the set of the 

all of the foregoing steps for each designation of the refer- aflme motions (a first order approximation) and then the set 

ence frame so that a super frame is constructed for each one 55 of quadratic motions (a<second order approximation). In any 

of the sequence of frames. case, the invention does not rely on a particular choice of a 

The step of assigning a velocity to each remaining pixel set of models. Therefore, this set can be defined by the user 

can be carried out by selecting the velocity for the region of according to an expected approximation, 

that pixel having the minimum error. The invention first estimates the regions without using 

BRIEF DESCRIPTION OF THE DRAWINGS 60 "2 ^^^.^.^yf^'i^^^^ 

- defined as-a set of points^ which can move with the same ^ 

FIG. 1 is a block flow diagram of the velocity estimate r model of velocity whatever the ^Vejogties. (In contrast, 

process of the invention. 'conventional techniques "attribute one particular velocity to 

FIG. 2 is a block flow diagram of a region estimation the region.) Then, after having computed the regions, the 

process employed in the process of FIG. 1. 65 velocities are estimated for each region. 

FIG. 3 is a block flow diagram of a merge process A velocity field between two frames is approximated by 

employed in the process of FIG. 2. a piece-wise regular function, since it is made by moving 
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objects. Each piece is a region, and the regions make a a measure defined on the connected sets of all R: M(R), 

partition of the entire image. The regularity of the velocity such that the gain of merging: M(R I )+M(Ry)-M(R I URy) 

field inside each region can be described by a continuous or is positive, 

discrete set of "velocity models" (set of afifine, set of a scale parameter x^O 

continuous, set of translation motions, and so forth). The 5 Giyen ^ these - dieQt ^ lhere ^ defined a criteria t0 

invention estimates the velocities between two frames by j . . - u iU * • u - *■ -u ? j «; a ~^SJ^ 

c • lL ;ii • -it- , - u . lU ; c: determine whether or not two regions have to be merged. We «ft 

satisfying the following principle: The velocity field is the ... 4l _ ; ; . iL '. 

vector field that fits in a minimal number of regions that rsaid thatwe want to merg^regions that can move jyith the 

constitute a partition of the image. The term "fitting in a ^same model of-velocityrThe ability to move is measured 

region" means that the vector field is regular (that is in the bv me minimal error term among the possible models of 

given set of models), and minimizes the variation along the 10 velocity, but this error preferably should not be compared 

motion of a quantity based on the images (like grey-level with arbitrary term. Therefore the invention compares the 

value). The trade-off between the minimal number of minimal error of merged regions, and the minimal errors of 

regions and the minimization of discrepancies in grey-level the two regions taken separately. (As understood in this 

is scaled by a scale parameter. The scale parameter is natural specification, the term "error" refers to discrepancy in a 

and physically built-in, in the sense that the notion of objects 15 selected parameter between corrresponding pixels in differ- 

is a scaled notion. The scale corresponds to the degree of ent frames, such as a discrepancy in pixel grey level.) Given 

simplification with which motions are observed. At coarse two regions R 4 - and R ; ., the invention then considers the 

scale, one only cares about major objects moving and not quantity: 
little details within these shapes that have some slightly 

different motions. However, such details will be present at a 20 _ „ . . P , BMI , v . . V(U ... . r/D ... 

finer scale. Therefore, regions obtained at a coarse scale 1 v<=w J v G w v*w J 
must be consistent with the regions obtained at any finer 

scale. This constitutes the "inclusion principle" that we _ . . . , 

believe is necessary (because physically natural) for any ^ * s ^ s ? 0Sltlve ( b J definition of the errors), 

scaled motion estimator. If the principle leads to non-unique ^ it represents the cost in terms of the error of the merging 

velocity fields, then among the possibilities, the one which of ^ two re S lons R , and R /* If me two re S 10ns can move 

has the less disparities between the velocity fields of the me same velocity model, this cost is small. This cost is 

others regions can be chosen. As a final step, occlusions are then balanced with the "gain" of merging which is defined 

defined by the nbn-bijective part defined by the velocity by the functions M: 
fields„ backward and forwards (As employed in this 

specification, the term ^bjje-ctive" refers to G Qi i jij)=M{R i )+M(R f )-M(R i URj) 

correspondence — e.g., via a velocity — between pixels in , , . , . , it _ . , . 

different frames.) Such forward and backward matching* ^ balance 1S scaled ^ the P«™™«« *■= 

can be performed or computed in the same process. baiance-c^^-XGCR ,r) 

A preferred embodiment carries out the foregoing by the ^ 

following steps: A negative balance indicates that the "gain" (scaled by X)is 

1. Computation of the regions without any attribution or bigger than the cost, and therefore the regions can be 
estimation of velocities. The computation of the merged. Conversely, a positive balance prevents the merg- 
regions are made by a region merging procedure: two ing. The balance decreases from a positive value as X 
adjacent regions are merged if according to the images 4Q increases, is positive when \ is equal to zero, and becomes 
they can move together (i.e., without loss of correspon- negative as X tends to infinity. The scale, at which the 
dence between corresponding pixels) whatever their balance is negative and merging possible is called the 
motion is. The merging of two regions follows the "merging scale". We define the merging scale between two 
algorithm proposed by Morel and al. for the grey- level regions R, and R y - by: 
segmentation of the images. 45 

2. While the regions are computed, the possible models of ^(R/^)= c e<*.^/V g (k<A) 
velocities inside each region are computed. In case of Given a scaIe parameter ^ all adjacent regions having a 
non-uniqueness, the models that are consistent with the meTging sca i e i ess than \ are merged. The merging are 
maximum number of regions are chosen. ordered with .respect to the merging scale. 

3. The occlusions are then computed from the non- 50 The^Scale : Parameter 
bijective part of the computed matching backward and parameter X controls the complexity of the result of 
forward. the minimization of the cost-functional. By choosing X-0, 

Multi-Scale Segmentation of the Velocities — Formation of one- ends up ^with every rpixel-asja^region. Conversely, 

the Regions: cfioosihg X big enough, there will remain only one region 

In forming the regions, grouping several pixels can be 55 corresponding to the entire image. We believe the multi- 
disadvantageous because one can group pixels moving with sca i e estimation of the velocities is best, since its reflects 
different velocities. The idea is then to group pixels that "can natural behavior. For example, the image of a moving car 
all move" with at least one model of velocities. The inven- has a global velocity that fits for the entire region defined by 
tion accomplishes this grouping using the same region- the car: the observation scale of the velocities is large, 
growing procedure as that disclosed in the publication of 60 Conversely, by looking closely, all the parts of the car arc not 
Morel et al. referenced above to segment grey-level images moving with the same velocity where the observation scale 
into piece-wise continuous regions. In order to carry out the ^ sma u. Depending on the application or the information 
region formation process, the following elements are input: that needs to be recovered, one will have to look at different 

a set of possible velocity models (pw constant, affine . . . ): scales. Choosing the optimum value X, that is the X which 

W 65 leads to the expected representation of the true velocity field, 

a measure of how accurate is a given velocity field in a is not necessarily possible without minimum knowledge 

region: E(R, V) about the images. Therefore, it is very important that the 
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segmentation results present some similarities between 
scale. These similarities are usually set in the following 
principle, hereinafter referred to as the inclusion principle, 
that should satisfy any scaled-segmentation: V>\,//, such that 
K>/^0, the set of the regions obtained at scale X is a set of 5 
subsets of the regions obtained at the scale p. In other words, 
the regions at scale X can be obtained by merging the regions 
obtained at scale /u, E(R, V) measures how accurate is the 
velocity model V in the region R. In the ideal case, if V is 
the true velocity, one has E(R, V)-0. Therefore, choosing X 
very small, region-merging will occur until the entire region 
moving with V is recovered, while any other merging will 
merge regions moving with different velocity models and 
thus create errors. Unfortunately, in practice, the value of 
E(R, V) is corrupted by the noise, and also by the fact that 
the selected set of models of V is an approximation of the 15 
possible true velocities. These two kinds of perturbations 
can not be quantified a priori, and therefore produce arbi- 
trary scaled value of errors. These errors have to be balanced 
with the scale parameter, and this is also the rule of X. 
Depending on the shape of the error E(R, V), these two 20 
perturbations of the images by noise will have different 
consequences on the estimate of E(R, V). 
Link:Between-Scale and Noise Ratios. 

In the presence of noise, the error E(R, V) is corrupted by 
the same amount of noise as the image is corrupted. 25 
Therefore, at small scales the velocity estimations are cor- 
rupted by noise. These perturbations disappear at a scale that 
is linked to the variance of the noise. 
Link Between Scale and Difference Between Models of V 
and True Velocities. 30 

In the same way, if V 0 is the true velocity flow in a region 
R, and V is the best velocity model that approximates V 0 , 
writing V=V 0 +e, one has 

\E(R, V)-E(R,Vo)\^M\ Rt2 J~\\ 

where ||. \\ yR2 stands for the L 2 norm in the region R. E(R, V) 
is the error that will be found, and E(R, V 0 ) is the error that 40 
would be obtained if V was the true velocity. The extra-error, 
which is shown here in the difference, depends on the 
difference between the true velocity and the modelized 
velocity. This cextra error-will^be:balaiicediwith the gain of 
merging, and therefore 'trie scale is also related to the 45 
approximation of the velocities made by the set of models. 
^GcclusioriSi 

Since videos are made of moving objects, some^points 
presentJn : one image im^t be occluded m : tlfeliext one , and 
conversely points^pf the next imagexan be:occhided in the 50 
^current one. The problem is how to define a criteria which 
delermines whether or not a point of a given image is present 
or occluded in the other image. Conventional criteria are 
based on the; size: of the errors-matching, and therefore 
involve a kind of threshold. In the present invention, occlu^ 55 
sions are preferably .defined wi'thbut grey level threshold- 
based.techniques, but rather as a direct result of me velocity 
estimate^The invention uses the fact that the (true) velocity 
field _is ^bijective .on the nomoccluded parts. Therefore, 
finding occlusions relies on the decomposition of the match- 60 
ing into two components: bijective and nor>bijective_ones. 
The points related to the non-bijeciive part are the-' occlu^ 
sion" points. In other words, given two;images u^and u^Tet 
X^and j^jrespectivery. bethe.set oLpixels of u a :and:u^ 
respectively: the matching of Uj towards Uj associates each 65 
pixel of Xj Jo a pixel of X^ . But; not necessarily all the pixels 
of Xj are a^pciatedjo r pixel of X r by tlus^matching. (The 
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^matchingris not necessarily bijective.) Therefore X^ canrbe 
decomposed into two sets: -H^ and 0 2 . is the ^hits- 
subset, that is the subset of the points of that are 
associated to at least one point of X x . And 0 2 -is the> 
complementary subset of H 2 in Xj. Then, any point of 0 2 is 
not, according to the matching, present in X v Therefore;the 
points of G 2 are some points of-X^ which are occluded in X a . 
Measuring the Accuracy of a Velocity Field, and Sampling 
Problems. 

Gontinuous^Image: Classical definitions of optical flow 
involve a kind of conservation assumption: points moving 
with optical flow have a constant grey level (or a small 
variation of their grey level), except^r^the_p^luded 
pixelSoThus, the velocity field V between frame Uj and 
frame u 2 is accurate if the quantity: 

«fe V)=kW-««))l 

is small for every non-occluded point. Thus, the quantity 
er(x, V) defines a possible measure of accuracy of the 
velocities. 

cDiscrefeTmage: A discrete image is represented by an array 
of pixels: u(i,j), which is a representation of the real 
intensity function u(x). u(i,j) corresponds to a mean value 
of u on the pixel surface. The problem is how to translate 
the quantity to a discrete formulation. Given the two 
images u a (i,j) and u 2 (i,j), the process compares the value 
of u 1 at the pixel (i,j), with the value of u 2 at the "position" 
(i,j)+V(ij). The problem is that this position is not nec- 
essarily an integer (for a non-integer shift) and then does 
not denote a pixel. Therefore the value u 2 ((i,j)+V(i,j)) is 
not defined. Moreover this value can not in general be 
deduced from the values of the image at the neighbor 
pixels. One possible solution to this problem is to subtract 
from the error term any possible contribution of the 
sampling error. This possible contribution depends on the 
shape of the intensity function u. For a u which has its 
variation bounded, we propose the following definition 
for the sampling error: 

eK(u}.v>nii«dM^ 

where BLI(u,x) denotes a bi-linear interpolation of u at 
location x, a is the minimal square -distance between V(i,j) 
and the center of a pixel (therefore a is always less than one 
half the size of a pixel), and L is linked to the bound of the 
variation of u by L=max x i Ju(x)-u(x+e)|, where e is a unit 
vector. Since L is in unknown, we propose to approximate 
its value by: 

L = min(imix !«,('* J) - ui ((/, /) + max|u 2 (7, y) - « 2 ((7 t y) + el} 

where (i,j) denotes the nearest pixel from (i,j)-»-V(ij)),and e 
denotes a unit pixel shift in any direction (1,0), (1,1), 
(0,1), . . . 

Accuracy of a velocity field on a region: The measure of 
accuracy is simply defined by adding the quantities e(V 
(ij),(ij)) for each pixel (ij) of the region: we set 

Since the error on a region is the sum of the errors of the 
pixels of the region, we always have: 
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for any two regions R A and R 2 that do not intersect them- The output of this step is 

selves. Moreover, one has also a list of the regions and, for each region, a list of the 

remaining models. 

min£(i?! U ft 2 ,v) * min £-(/f I( V) + min£(rt 2 , V) 3 - Occlusions estimation. (FIG. 1, block 140) The output 

v v v 5 f rom the preceding steps is used to finding the .pixel- to -pixel 

matchings between the two frames forward and backward. 
Therefore the cost of the merging as defined above is either The occlusions are estimated regarding the non-bijective 
positive or zero. part of the image defined by these matchings. The input of 

The Gam-of-Merging: this step is: 

The gain of merging is a measure of improvement in a 10 the regions and the possible models of velocities for the 
selected parameter obtained by merging two particular matchings, forward and backward, 

regions into one. The gain of merging can be defined as a The output of this step is: 

measure on the sets of R. However, other definitions are the sets of the occlusions of the matchings, backward and 

possible. Given two regions R,- and R ; , the gain of merging forward. 

is defined by: 15 4. Improvement of the velocity estimates within each 

GfK jn MUn + M*VMaum region— an optional step. (FIG. 1, block 160). In each 

G^ji^MW+M^yM^URj) region) an attempt made to find ±c best vclodty field in 

Merging of two particular regions is allowed only if M is a set of models larger than the models found in the preceding 

such that the gain of merging is positive. Otherwise, merging ste P- ™ s step can be carried out with standard minimiza- 

is not allowed. There are many different candidates for M. 20 ^ons techniques. 

The most simple is to set M(R)=1,VR so that the gain of Thi? treatment separates the occlusions estimation from the 

merging does not depend on the regions, since it is always region estimations and velocity estimation steps. However, 

equal to 1. One can also set M(R)=Length of the boundary il ^ possible to couple these steps in any one of many 

of R. This choice tends to minimize region boundary lengths possible implementations, which will not be described 

by penalizing regions having long boundaries. Such regions herein for the sake of brevity. 

can have strange shapes, and therefore might be unnatural in Eacn of the four steps in the foregoing process (of steps 

an image. This choice provides regions having more regular described above will now be described individually, 

boundaries than the other choices. REGIONS ESTIMATION 

General Process of the Invention Direct minimization: The regions estimation sub-process 

The general process of the invention, illustrated in the 30 is illustrated in the block flow diagram of FIG. 2. In this 

block flow diagram of FIG. 1, consists of the following sub -process, each pair of regions has the following merging 

steps: scale: 

1. Region estimation (block 100 of FIG. 1). This is the 

main part of the velocity estimation algorithm. Regions are 35 f\R h r } ) = (min E(R; U V) - min E[R h V) - min E{Rj, V)) / 

found using a merging procedure in which two regions are VgW VgW VeW 

merged if they can move with same model of velocity + w (/?>)- M(R; U Rj)) 
regardless of which particular velocity model suffices. The 

merging operations are sorted according to a "merging ■ . t . 

scale" as described below. This step finishes when merging ™? * cale cprre^onds in a sense to the minimal scale at 

is no longer possible. The inputs of this step are 40 ^ merging of the two regions decreases the energy. 

A . . . , . , „ , . « . - ^ Therefore two regions having a small mergmg scale are 

An initial segmentation, (generally each pixel is a region), more similar ^ two regions having a ]arge one Two 

and the two images. regions are merged if the merging scale is less than or equal 

a set of possible velocity models (piece-wise constant, to the current scale. This is carried out in the following 

af&ne, and so forth): W, or a progressively inclusive set 45 sub -steps: 

of the foregoing sets. j. initia lizatio n (FIG. 2, block 200). The initial regions are 

a measure of how accurate is a given velocity field in a defined. Generally each pixel defines a region. But other 

region: E(R, V). initializations are possible. The merging scales between all 

a measure defined on the connected sets of R, M(R), such V*iTS of neighboring regions are computed and added to a list 

that the gain of merging: M(R / )+M(R / )-M(R ( -UR / ) is 50 of possible merging. 

positive. ^Segmentation (FIG. 2, block 220). R ( . and R y denote the 

a scale parameter: X^0. re S ions havin S ^ smallest merging scale (P(R,-, R y )) in the 

The outputs of this step are ^ of P ossible merging. If this merging scale is larger than 

... ftU • the specified scale K 0 ("NO" branch of block 220), then the 

The final segmentation list and description of the regions. „ * , 4 , °> , 4 ... , c 

_ . t . t t T . . 55 process goes to the third step, i.e., the step of block 280 of 

For each region, the possible models of velocity and their FIG. 2. Otherwise ("YES" branch of block 220), the two 

respective errors. _ regions are mtvgf > d ^Vo a new rcgion R (block 2 40). An 

2. For each region selection of one velocity model updale 0 eration (bi ock 260 0 f FIG. 2) is performed as 
(among ^the set of possible velocity models) is made accord- foUows; ^ regjons R ^ R are erased from ^ 
mg to global criteria . (FIG. 1, block 120 After the preceding 60 segmeD tation, as well their merging scales with other 
step there might be several possible velocity models regk)ns The merging of R ^ its nei ^ bor re ^ ons 
attached to each region The aim of the step is to reduce the are computed md added t0 me list of i51e mergings . 
number of velocity models attached to each region accord- 3 .a^^ of :possible velocily mod ^ io eac h remain- 
ingto some global catena. ^ing^gions (block 280 of FIG. 2). In each remaining region, 

The inputs of this step are 65 ^ velodty models having me smaUest errors are selected as 

some global criteria, the possible velocity models inside the region, other models 

the output of the step 1 . being discarded on a region-by-region basis. Determining 
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the "smallest" errors is a user-defined process and can be 
readily constructed by the skilled worker. Depending upon 
the noise and other parameters, the models may be selected 
according to the following criteria: 

The models that minimize the error terms. (There might 
be not only one minimum) 

The models that have errors less than a quantity linked to 
the noise. 

Other criteria defined by user. 
The most important part in terms of computation is the 
estimation of the merging scale for each pair of neighbor 
regions. There are different strategies that can be employed 
for estimating P(R^ Ry), depending upon the "shape" of the 
set of the models of velocities: discrete space or continuous 
space. The merging scale between two regions R, and Ry 
involves the computation of the two quantities: min VeW £ 
(R ( .UR ; ., V), and M(R-UR ; ), (the others are computed during 
the creation of R, and R y , and therefore need not be 
re-computed). The estimate of M(R ( URy) in general is 
without heavy computational cost. But, the other term 
requires the computation of the minimal error on the merged 
region for each possible velocity model in the set of velocity 
models. For discrete images and models, it is always pos- 
sible to compute for all velocity models the corresponding 
error and then take the minimum. This method shall be 
referred to herein as the direct method. Such an approach 
provides exact results but in general is computationally 
burdensome. Instead of the direct method, a pyramidal 
method is preferred which attempts to minimize the number 
of calculations, and is described below. The result of the 
pyramidal method is an approximation but is computation- 
ally far faster than the direct method. In the case of con- 
tinuous space of models, the direct method is only possible 
if one has (depending on the chosen set of models) an 
algorithm to find the global minima of the errors among the 
models. If only algorithms that find local minima are avail- 
able (e.g., gradient descent algorithms), it is preferred to use 
a continuous technique similar to the pyramidal technique 
for discrete images and velocity models described above. 
This technique is based on storage of the local minima for 
each region in order to optimize the computation of the 
global minimum for the merged region. 

Discrete space of velocity models — Fast implementation: 
The following fast implementation is the preferred one. This 
method employs a discrete space of velocities models: 
• • • > v «z>} where nb is the number of models. Since 
the images are sampled with pixels, it is always possible to 
consider a discrete space of velocity instead of a continuous 
space. The most standard set of velocity is the set of the 
pixels translations where W is defined by 

{(i ) j),forij e {-V,...,V}} 
where V is the maximal velocity expected (the size of the 
image for example). 

Pyramidal computation of the errors: The merging scale 
between two regions R ( . and R. is: 

P(R it Rj) = {minEftf; U V) - min £(/?,-, V)- min E(R h V)) I 

J \VeW few VcW 1 it 

(M(Ri) + M{Rj) - M(R : U Rj)) 

Computation of the foregoing requires that the following 
quantities be estimated each time a determination must be 
made whether to merge two regions R, and R y -: min VtVV E 
(RfURy, V), and M(R i -UR / ). And, if the latter quantity is in 
general very simple to estimate, the former one is compu- 
tationally intensive. Direct computation of the minima 
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requires calculating the error of matching for each velocity 
model for the entire region RjURj. In other words, it 
requires an estimate of VViEpyjRy, V)). 
VV, E(R / UR / , V) is the sum of the two quantities: E(R f , V) 

5 and E(Ry, V) which have been computed previously at the 
merging steps that created R, and Ry. Therefore, E(R I -UR / , V) 
can be computed as the sum or two quantities already 
computed. Unfortunately, for a large number of velocity 
models, it is impossible to store all these quantities for each 

10 remaining region. This problem is solved in the preferred 
method, which is to store only the quantities which, when 
added together, tend to produce the minimum over the V of 
E(R t URy, V). In other words, what is stored is the "N ( " 
smallest values of E(R i? V) and the "Ny" smallest values of 

15 E(Ry, V), and their corresponding models of velocity. The 
two numbers "N," and "Ny" have to be chosen as a com- 
promise between exact minimization and memory alloca- 
tion. This associates with each region a "List" of the best 
matchings. The List contains the N best errors, and their 

20 corresponding models of velocity. The List for the region 
R^URy, and the quantity min,, E(R ( -URy, V) are then com- 
puted directly and uniquely from the Lists of R, and Ry. That 
is why the computations are pyramidal in the preferred 
embodiment. 

25cThe^Pyramidal'Method: 

(1) Initialization: The first step is computing the Lists of 
possible velocities for each initial region. In practice, the 
initial regions are the pixels of the image. 

However, the initialization is valid for any set of regions 
30 that do not intersect themselves. For each initial region R: 

1. Compute for each V f in W the associated error: e .=E(R, 

v ( ). 

2. Take the velocity models which have the "N(R)" 
smallest associated errors: V, ..... V, , and their 
associated errors: t VQ , . . . , c v ^^_ v The number N(R) 
is fixed by a user-defined procedure which is discussed 
below under the heading "Using the pyramidal 
method". 

4Q 3. Set Max e (R) equal to the next smallest error. 

4. Sort the list with respect to the velocity model 
numbering, with indices as follows: 0^i o <i 1 < . . . 
"^WcKvi^ 110 - L(R) is the list formed by these indices. 
The list L(R) is characterized as follows: VeL(R) if 3kL(R) 
45 such that V=V,., and its associated error in R is then e(V, 

(2) Merging regions and merging lists: The merging step 
is illustrated in the block flow diagram of FIG. 3 and consists 
of the following sub-steps: 

50 (a) Computing the merging scale. The minimal error 
Min e (R/JRy) is obtained during the computation of the 
errors for R ( -URy. Therefore the merging scale P(R„ Ry) 
is directly obtained from the computation of the errors 
of R^URy: Given two regions R ■ and R the errors of the 

55 velocities V for the region R^URy are defined (block 
300 of FIG. 3) from their inclusion in two lists (block 
305 of FIG. 3) as follows: 

If V is in L(R ( .) and L(R,) (block 320 of FIG. 3), then 

(block 325 of FIG. 3) 
If V is in LOO but not in L(Ry) (block 310 of FIG. 3), 
then 

(block 315 of FIG. 3). 
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If V is in not L(R ( ) but is in L(R 7 ) (block 330 of FIG. minimal error, and the probability that the minimal error is 

3), then made of maximal error is weak. Moreover, the following 

considerations show that this approximation is justified: 
e(R,ufly,v>Max«(* ( )+«(.R / ,v) Case of a flat region: A flat region, or a region containing 
(block 335 of FIG 3) 5 4 eta il s , has a large number of possible matchings, 
This process keeps track of the minimal error and if the latest and . towfore i* maximal error will be small. Such a 
error computed is less than the previously stored minimal re g ,on ^ * blg l ° merg6 ' Mergmg ^ 
error ("YES" branch of block 340 of FIG. 3), then the ° ccur ™ th th , is re 8 LOn > m ^ ,he re 8 lon encloses detaJs 
previously stored minimal error value is replaced with the that reduce numbcr of P ossiblc patchings, 
latest computed error (block 345 of FIG. 3). Otherwise 10 Case of a non-flat region: Conversely, a region contaimng 
("NO" branch of block 340), the process continues as before sufficient detail to reduce its number of possible match- 
described until no V remains (block 350 of FIG. 3). m 8 s nas a lar 8 e maximal error. Then, its value does not 
A V which is neither in UR,) nor in L(R ; ) is not consid- matter anymore since the minimal error computed 
ered. However, its corresponding error can be set to Max,, durin 8 merging criteria checking will not be achieved 
(IQ+MaxiR,). The elements of the lists are sorted with 15 b y mis value - 

respect to the velocity model, and computational steps are How . to flx me size N (R) of me lists: 11 15 clear mat b y 

saved if V is not in the lists. limiting the size N(R) of the list L(R) to the number of 

(b) Merging the regions R,and R,. If R, and R, are selected P ossiblc T el A 0C1 ] y models 6xa * S6 gm<= ntat i°n wil1 bc 

to be merged (smallest merging scale among the neigh- ^.PUted. And, conversely for smaller N(R) the approxi- 

bor pairs of regions), the two regions are merged and 20 m f at ! on of me errore will be rougher, andtherefore the results 

become a new region R. We associate to this new of the segmentation will be rougher. Therefore a compro- 

region the list of the "N(R)" smallest errors (block 355 mKe between memor y cost ^ have *> be found - 

of FIG. 3). The errors associated to R=R,UR, are However, in some cases, ,t is sufficient to only store the 

obtained as described above. The velocity models cor- e ™ rs ^ hlch are . llable f° '^tribute to a minima error 

responding to the "N(R)" smaUest errors are selected. 25 when me re S lon 15 merged with others regions, and stall have 

The maximal error Max e (R) is set as the next smallest " «act segmentaUon. For example, assuming that the 

error(block360ofFIG.3). All the selected models are y . a ° e ^ between tb : e < wo f rames consls,s of onl y 

sorted with respect to the model numbering in order to ^ mo&ons > .™ d that the .""^ «e corrupted by an 

simplify the next merging. The list L(R) is then denned. addltlve , noise ™ m a S} ven ^ lance . °^ ^ tbat ^ * ll * "Tie 

The block flow diagram of FIG. 4 illustrates in greater 30 & ut ^°^) ve t locit y of a region R, then E(V, R) is 

detail the updating operation of block 260 of FIG. 2. staUsUcally less than 2* a multiplied by the area of R. 

Referring to FIG. 4, the updating operation is given the pixel Therefore the size of the list of a given region will be fixed 

lists of the two regions R,, R y that are being merged together f? ch m / xlma ! erro , r s i° red 15 less man 2*a*Area(R). 

(block 400 of FIG. 4). The merging scales of the two J 1 " 5 choice of ^'^'f s leads t0 exacl minimization. The 

individual regions are removed from the list of merging 35 ^B om « exam P le ^strates me meaning of the phrase 

scales (block 420 of FIG. 4). The merging scales of the ,^ ble t0 c ° ntribu,e . employed above. Tms example also 

newly merged region R consisting of the union of R,. and R, grates mat the size of the Lists can depends on the 

with all adjacent regions are computed (block 440 of Fie' re S 10n J s P lle of me for . e g° m g' slze f. of a » me lists may 

4). The newly computed merging scales are then added to e . xce f d me memor y available > u °less a limit is imposed on 

the list of merging scales (block 460 of FIG. 4). 40 the list size. 

The computation of the merging scales of block 440 of ih ' L **; Glven tw ° *f&°™ »i and 

FIG. 4 is illustrated in greater detail in the block flow R *> lh f **} of ?» UR * "l^? 1 * com P uted from «™> 

diagram of FIG. 5. As illustrated in FIG. 5, the computation fr ° m * e ^ ° f R > and R * ^".f, mel S m S occurs me size 

of block 440 requires the errors computed by the merge ° f m ? hst mer 8 ed [ e g lon wdlbe less man the sum of 

process of FIG. 3 at the end of the step of block 350. For 45 me a f f me . Lists ° re 8 lons mer 8 ed . wbich are 

each adjacent region, the cost of merging is computed (block c u ras f d the maximum memory used to store all 

510 of FIG. 5) in accordance with the definition of C^R,, R,) ^ ] f s . ls acmeved a ! the initialization step, 

stated previously in this specification. The gain of merging Weakening pyramidahty towards exact computation: If one 

is also computed (block 513 of FIG. 5) for each adjacent chooses l ° Imut the . slze of the but stl11 wants to 

region in accordance with the definition of G(R„ R) stated 50 cc ™P ute tbe exact mmrmization, one can weaken the pyra- 

previously in this specification. From the cost and gain for P 16 ™ lmm *}_ error Min - 

each adjacent region, the merging scale is computed (block S^"*^^. ^' ' ' f^J^T ^ ^ e "° r 

530 of FIG. 5) in accordance with the definition of P(R,, R y ) ^ > f n °* m both W P» Tnat 

previously given in this specification. means tbat < R ^J' ^ wU b « imputed using the maximal 

55 error 01 one ot the regions R t or Ry. Thus, the computed 

Using„the_Ryramidal Method errors differ from the true ones only when the maximal error 

Rule of the maximal error: For any region R the list L(R) is used. It is possible to recompute the list of a merge region 

contains the "N" smallest errors and their respective velocity RfURy if the minimal error is achieved by using the maximal 

model. As for the error of a velocity model which is not in error of the lists L(R ( ) or L(R ; ). In that case the list L^-URy) 

the list, such an error is necessarily larger than the maximal 60 will be computed directly from the images and not from the 

error Max e (R). In such a case, in the merging procedure lists L(R^) and L(R ; ). But, this partial lost of pyramidality in 

described above, all V which are not in the list L(R) are the computation of the list should occur in practice very 

replaced with a quantity which is no less than the true but rarely. 

unknown (unless directly computed) error E(R, V), namely Implementation with increasing set of sets of models. Some 

the maximal error Max tf (R). In case of fixed list-size, the 65 models of velocities do not make sense when regions are as 

replacement leads to an underestimate of the error for the small as a single pixel. For example, the model of motion 

merged region. But, the criteria of merging is based to the corresponding to a zoom of 10% between two images is to 
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be considered for regions having size of at least 5 pixels 3. Merging the regions R ( - and R y : If the regions R ( and R, 

wide. Therefore there is no need to consider such motion at are selected to be merged, they give a new region R. We 

the beginning of the merging procedure when the regions are construct the list of the new region R by storing the 

one pixel wide. Therefore, it is preferred to begin the "N(R)" best initial value (the ones which give after 

merging procedure with a set of models containing the 5 i oca i minimization the smallest error), 

models appropriate at a region size corresponding to a single ^ ^ m of mis metnod ^ t0 deduce a g! o5al mm i mum 

pixel; then, when the segmentation stops, we consider a while avoiding re dundant computations, 

larger set of models and continue with the larger set. The Variants 

purpose of the sequential increase of the set of velocity TT . * , » ^ w . , . . . , 
*, t . 4 j j i . p i i j *u Using grey-scale segmentation first: It is possible to initial- 
models is to reduce the number of calculations and the in . r • . ^ . ^ . • . 

- , r t ij j 1U rze the regions by something other than the pixels. For 

required memory. For example one could consider sequen- . & . , r . 

tiallv the followin sets* example, one can use the results oi another segmentation, 

1 set of " te e tr latio &U °^ ^ a S re y _sca l e segmentation. If in a particular case 

^ " there is some information about the regions, it is possible to 

2. set of integer translations combined with a set of large ^ such infonnation in a p re . S eg m6 ntation process, 
zooms and rotations 15 Multi-resolution: In order to accelerate the process, and to 

3. a set that combines the preceding set and some smaller reduce the ^ of memoly> one can easi]y define a multi . 

zooms and rotations. resolution scheme from all the methods described above. 

Continuous Space of Velocity Models. Indeed) it reduces me of velodt models tQ be 

With a continuous space of velocity models, the problem considered on the fine grid level. One only has to consider 

is far more complicated than with a finite number of models. 20 , ne range obtained for me coarse grid at ^ corresponding 

Therefore, it is preferable whenever possible to quantify the pixel and its neighbors 

continuous set of models. However, when not possible or Multi . dillieMionjl data; The a i gorithms app i y ^ for 

preterable, mere are two ways to compute the reg.ons: multi-dimensional data. One just has to define the error 

Direct minimization. This is the preferred method and has terms on this daU> by for e k addin me emj& of each 

already been described in this specification under the 25 dimension 



heading "Direct minimization" 



Matching several frames with consistent regions: This vari- 



Lists methods. In general, methods that give the global ^ is sli ^ u more co kx man me & one H 

minimum which would provide the best result, are not we want to match one frame tQ several frames such ^ al , 

available However, there exist many numerical meth- me matchin have the same re ^ ons but mi ^ t have diflfer . 

ods that find some local, minimum around an initial 30 ^ fc velocities We then haye tQ aUach tQ a ^ a 

value. Away to deduce the global minimum from these ^ for each frame we want tQ match tQ ^ ^ ithm ^ me 

techniques is to minimize around many initial values same m , enns of ions> but we now haye to work with 
and take the minimum of the local minima found. For Usts for each re ^ Qn ^ cos( of m between 

mat we approximate the set of models, W by a discrete ^ re ^ ons is ^ m d b addin ^ minimal em)r 
set W ; . Each model of W, will be a imtial value for the 35 for each ^ mel ^ ^ ^ for ^ d 

minimizations. Therefore the minimum regions afe computed by merging the corresponding lists of 

the two regions. 

v£$ E{R ' V) Reduction of the Possibilities According to Some Global 

40 Criteria. 

will be approximated by f f 0 ^)^ "gi°f = The merging procedure 

rr * for adjacent regions (2-normal segmentation) described 

above can be used to merge non-adjacent regions. While the 

min minjocai^tft*, V)) regions have been estimated, the number of remaining 

45 regions is in general small (it should correspond to the 
number of moving objects), and therefore we can consider 
where min_local W r stands for the local minimizer in the all pairs of regions for a non-connected merging. The 
space W. The problem is that W, should be large enough algorithm of merging is exactly the same than that described 
have a good approximation. The computation of the mini- above except that we consider all pairs of regions instead of 
mum involves many uses of the local minimize rs which are 50 all pairs of adjacent regions. The output of this step is then 
costly in term of computations. The minimum for the two a set of regions not necessarily connected and made of 
regions R ( and R f has a high probability of being achieved unions of some connected regions obtained in the 2-normal 
around a local minima of R, and R /t Therefore, it is preferred segmentation. Each of these regions consists of the con- 
to consider only the best initial values of R, and the best nected regions that can move with the same velocity model, 
initial values of R y in order to initialize the local minimizer 55 As before, this produces a reduction of the possible models 
for R.URy. Then, the algorithm is the following of velocity for each region, since it will choose those which 

1. Initialization: For each initial region R, compute around are compatible with other regions. 

initial values in W, the local minimum and its respec- Grouping non-adjacent regions that have less disparities 

tive error. Store in the list L(R) the "N(R)" initial values in terms of velocity models: If merging non-adjacent regions 

that give the smallest error, and their respective error, en having pixels that can move with the same velocity model is 

Set Min e (R) to the minimal error found. not sufficient for uniqueness, one can still choose for each 

2. Computing the merging scale: Given two regions R t region the models that have less disparity with the models of 
and R ; , for each initial value of the lists L(R £ ) and L(R y ), the other regions. The same algorithm can be applied, but 
compute the local minimum of the error for RfUR,. Set with a different error definition. Instead of relying on the 
Min^RjURy) to the minimal error found among all the 65 error in grey level, we define the cost of merging as a 
initial values. The merging scale is computed from function of the minimal difference between their possible 
Min^RjURy) as before. velocities. This assumes the choice of a norm N in the space 
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of the velocity models. The cost of merging regions R, and 



R y is then 



min min N(V - 



W) 



where W(R) is the list of possible velocity models of the 
region R, and N denotes a norm on the space of the velocity 
models. When merging regions, only the models that mini- 
mize the cost are retained as possible velocity models. 
Therefore, each merging reduces the number of possible 
models for the considered regions. 
^Occlusions? 

The pjo^ss_pLdeterminingJwhich pixels:are occluded is 
illustrated in -the_block w flow: diagram-^fJJG. 6. Occluded 
points are the points rejected by the process of FIG. 6. This 
process uses X x and X J1> the set of the pixels of the first and 
second images, respectively (block 600 of FIG. 6): 

Step 0. Mtialization: Set X 1 °=X 1 , X 2 °=X 2 (no points 
occluded.) 

Step 1. Match forward (block 610 of FIG. 6): Match (via 
velocity) X" toward X^, and decompose the set X/ 
into the two sets: H 2 (block 620 of FIG. 6) and 0 2 . Set 
X 2 M+1 =H 2 (block 630 of FIG. 6) and reject from X?. the 
points of 0 2 . 

Step 2, Match;baclrward (block 640 of FIG. 6): Match (via 
velocity) X"* 1 toward X", and decompose the set X 1 " 
into the two sets: H a (block 650 of FIG. 6) and Set 
X i" +1 - H i ( block 660 of FIG. 6) and reject from X J the 
points of O v If X/^Xj" or X 2 K+1 *X 2 M ("NO" branch 
of block 670), then loop to step 1 (block 610). Other- 
wise ("YES" branch of block 670), stop. 
T^e_fir^alj>ets: 

tt^rst_andihe second jmages? respectively. Their comple- 
mentary sets define the points of the first image which are 
occluded in the second image, and the points of the second 
image which are occluded in the first image, respectively. 

The algorithm stops in a finite number of iterations. The 
number of iterations is bounded by the number of pixels of 
an image. The algorithm does not use the complete bijective 
structure of the velocity of the non-occluded parts. 

Linking occlusions .to -re gions. After having found the 
occlusiionsby the^Fdcess described above, the process can 
try to attach occlusions points to non-occluded ones, and 
then guess about their possible motion. The idea is to merge 
an occluded region to a non-occluded region if the first one 
can move with the second one. As employed herein, the term 
"can move" means that the shifted location of the occlusion 
region by one of the possible models of velocity of the 
non-occluded region is covered by another moving region. 
It reduces the number of possible velocity models for the 
non-occluded region. Indeed, not necessarily all the possible 
models of the non-occluded region would send the occluded 
region under another moving region. If the merging is done, 
such models will be removed from the list of the possible 
ones. 

Improvement of the Velocities Estimates Within Each 
Regions. 

Once the regions have been computed, a model of veloc- 
ity is attached to each region, as described above. This 
model is an approximation of the velocities within the 
region. Now, the knowledge of the region plus the approxi- 
mation of the velocity enables a refinement of the velocities 
into a wider space of functions than those formed by the 
models. There are a number of conventional techniques that 
can perform such a refinement of the velocities. 



Frame Fusion. 

Introduction to frame fusion. Frame-Fusion merges 
frames of a sequence in order to produce an image that has 
a higher resolution than the original frames. This image is 
5 called the "super-frame". Due to motion, frames in a moving 
sequence contain different information of the scene. By 
superimposing the different information, it is possible to 
increase the resolution. Fusion of the different data con- 
tained in the different frames requires registration of all the 
io objects in motion. It also requires a knowledge of th 
occlusions, except where there is no occlusion, or only 
global motion is assumed. The information of the velocities 
and the occlusions is a result of the velocity estimation 
process described above. The frame fusion process can 
15 assume the following forms: 

Creating a super-frame. One chooses a frame of the 
sequence as the reference frame, and frame fusion is 
performed with reference to the time of the chosen 
reference frame. The chosen frame is used to register 
20 objects at the frame time of the reference frame. The 
super-frame will look like the chosen frame but with a 
higher resolution. 
Creating a super-sequence. One can also apply the process 
described just above for all frames of the sequence 
successively. We then obtain the entire sequence with a 
higher resolution for each frame. 
In the following, we will only describe the process for the 
creation of a single super-frame since the creation of the 
super-sequence is a repetition of this process for all frames. 
Matehing z a^d- 7 CdUecti6n-of D ata) 

Let u,, ie{l, . . . , N} be a sequence of images and u t - o the 
chosen reference frame. The "collection of data.step!Lcon- 
sists in dejemining foreach pixel of the other-frame whether 
oj^notat is occluded at the chosen: frame, ;and,if:not;;its^ 
posriionjn tBs frame. This is expressed in a data list made 
from all the pixels_of^the:sequence. To each pixel p of the 
sequence, we associate a datum d p that contains three 
components: 

4Q ca:position-x: The position is the estimated floating point 
location of the pixel at the chosen time. 

c a grey-leveLu: The grey-level is the grey-level of the pixel 
in the other frame. 

anjndicator~ofcocclusion-o: The indicator of occlusion is 

45 set to zero (0) if the point is found occluded at the 
chosen time, or otherwise it is set to one (1). 
The field position is relevant only if the indicator of occlu- 
sion is 1. The datum is computed for all the pixels of the 
other frame and added to a data list. 

50 Construction of the list from the pixels of the chosen 
reference frame. Since the chosen reference frame corre- 
sponds to the scene at the chosen time, the location of each 
of its pixels at that time is their locations in the reference 
frame. The indicator of occlusion is set to 1 for all the 

55 reference frame pixels since by definition they are all present 
(not occluded) in the reference frame. 

Construction of the list from the pixels of a frame i other 
than the chosen reference frame. A determination is made for 
each pixel of the frame i whether or not it is occluded in 

60 frame io. If it is not occluded, its velocity from frame i to 
frame i 0 is determined. If a pixel p of frame i is found 
occluded, the occlusion field o of its datum is set to 0, and 
its other fields are suppressed. If it is not occluded, then its 
location field is set to its location in the frame i plus its 

65 estimated velocity from the frame i to the frame Iq. The grey 
level field of its corresponding datum is set to its grey-level 
in the frame i, and its occlusion field is set to 1. Repeating 
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these operations for all the frames of the sequence, we obtain 
a list of data that contains position, grey-level, and whether 
or not the point is occluded for all the pixels of the sequence. 

Construction of the super-frame. For the construction of 
the super-frame, the process only considers the sub-list of 5 
the pixels that are not occluded in frame io (o field not equal 
to zero). The process constructs a first version of the 
super-frame by interpolating weighted grey-level values of 
the pixels of the list L 0 according to the distance to the 
considered pixel. The weight of the interpolation decreases 10 
exponentially with respect to the distance. However the 
speed of decrease depends on a "density" function. The 
density measures the number and distance of the pixels of Lq 
from the considered pixel. The density is large if there are a 
many pixels for which the distance is small. The larger the 15 
density, the faster is the decrease of the weight with respect 
to the distance. 

Deblurring step. In the preceding step, a grey level value 
is given to each pixel in a first estimate of the super-frame 
u. These values have been estimated by using a weighted 20 
interpolation between grey-level values of the pixels of the 
sequence. The pixels of the sequence have larger physical 
size than the pixels of the super-frame (due to the zoom). 
Therefore their grey-level values correspond to a larger 
physical size than the size of a pixel of the super-frame. As 25 
a consequence the image u is a blurry version of the 
super-frame U where the blurring kernel is set by the size of 
the zoom (which coincides with the increase in resolution 
from the chosen frame to the super-frame). The shape of the 
blurring kernel depends on the sensors that captured the 30 
sequence. However, in general the average is a local uniform 
average or a Gaussian- weighted one. This implies that the 
super-frame U is linked to its first estimate by the relation: 



u=G*U 



35 



40 



where G is the kernel corresponding to the average. Now, 
due to noise and other perturbations it is better to consider 
the last equality as being at least nearly true everywhere, so 
that 

where a„ is the variance of the assumed noise. Therefore U 
is given by an inverse problem which is not necessarily 
well-posed because of its non-uniqueness. Then, among the 45 
possible solutions, we choose the one which has the smallest 
total variation, so that U is defined by the function that 
achieves the minimum of 



50 



subject to the constraint defined by the inequality 

/(u-G*U) 2 Sa„. 

The Frame Fusion Process. 

The general plan of the frame fusion process is illustrated 55 

in the block diagram of FIG. 7 and consists of the following 

three steps: (1) the collection of data, (2) the construction of 

a first version of the super-frame, and (3) debluring. The 

inputs to this process are: 

A sequence (700 in FIG. 7) of images: u<j, . . . u^. 60 
Selection of a particular reference frame (710 in FIG. 7): 
U V 

Factor of Zoom: Zoom^ and Zoom y for the zooms in the 
x and y directions. 

The inputs of the velocity estimation algorithm described 65 
above in this specification. These inputs consists of a set of 
velocity models and a scale parameter. 



The outputs from this process are: 

The super-frame (720 in FIG. 7): U is the superframe that 
corresponds in time to the chosen frame u^. 
The major steps of the frame fusion process are depicted in 
the block flow diagram of FIG. 8 and will now be described 
with reference to FIG. 8. 

Collection of data. The collection of data (block 800 of 
FIG. 8) corresponds to a registration of non-occluded pixels 
with the chosen frame. For a pixel p, we call p.x its location 
in its frame, and pi the number of its frame, and u(p.x,p.i) 
its grey-level. 

Step 1: Collecting pixels of the frame i 0 . For a pixel p of 
the frame io (p.i^io) vve set its datum d to 
d.x=p.x 
d.u=u(p.x, io) 
d.o=l 

Step 2: Collecting pixels of the other frames. For each 
frame i*i 0 : 

Step 2.1: We match frame i to frame ^ with the algorithm 
described above in this specification. The output of this 
matching is for each pixel of the frame i: 
whether of not it is occluded in frame iQ. 
and if it is not occluded, its velocity between frame i 
and frame i 0 : V^{p). 

Step 2.2: While the matching is done, for each pixel p of 
the frame i: If the pixel p is occluded, then we set its 
datum d to 
d.x- 
d.u- 
d.o-0 

And, if it is not occluded to 
d.x=p.x+V v <p) 
d.u«u(p.x,p.i) 
d.o=l 

Step 3: Organization of the data. The aim of this step is to 
allow a fast access to the data: given a pixel of the 
chosen frame, we want to access quickly all the pixels 
of the sequence of frames. The plan of the data structure 
for this step is illustrated in the block diagram of FIG. 
9 and will now be described with reference to FIG. 9. 
We consider the sublist (block 900 in FIG. 9) of the data 
corresponding to the pixels that are not occluded in the 
frame io, (d.o*0). This sub -list, called Lq, has a number 
of entries less than or equal the number of pixels in the 
sequence of frames. Let us call N 0 its size. We define 
three arrays: Ca, In (block 940 in FIG. 9), and Po (block 
930 in FIG. 9). Ca and In (block 940) are integer arrays 
of the size of one frame. Po (block 930) is an integer 
array with size N 0 . Ca will contain for each pixel of the 
chosen reference frame the number of pixels of Lq that 
are going in its area. This number is at least equal to 1 
since the pixel itself is in the list Lq (block 920). In 
(block 940) is an index linked to the array Po (block 
930). Po is an array of pointers on the list Lq. Po and 
In are organized so that for the pixel number j, Po[k] for 
k between In[j] and In(j+1]-1 are the entry numbers of 
the element of Lq that have their location on the pixel 
j- 

Construction of Ca. The array Ca is initialized to 0. Then 
the following loop is performed on the elements of 1^: p. We 
add 1 to Ca[p.x] (Ca at the integer value of the position p.x). 
At the end of the loop, Ca associates to a position (in pixels) 
equal to the number of pixels of the sequence that are 
registered with the pixel of the chosen reference frame. 

Construction of In (940). We set for each i^0: In[i]= 
^CaO]- 
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Construction of Po (930). The array Ca which is not useful sequence of image frames have larger physical size than the 

anymore is set to 0, and is used for another purpose. One pixels of the super-frame. As a result, the image u is a blurry 

more loop is peroformed on the elements of the list L^: For version of the super-frame U, where the blurring kernel G is 

a pixel p having a position d.x in frame i 0 belonging to pixel set by ^ ^ 0 f a pixe i of me sequence in the finer-grid 

j we set Po[In[j]+Ca[j]>n where n is its entry number on 5 dcfincd b me supC r-frame. The super-frame is constructed 

the list L^Ve then add 1 to Ca[j]. When the loop is done, from j b - minimizi me foUowi eQe ^ that v 

the array Ca is erased. j . . . to bJ 

mtu-1 ii *u ** r ,j . corresponds to a minimum: 

While all these operations are performed, the access to the r 

pixels that are registered with a particular pixel p of the e(c/)«J|V£/| 
chosen frame is as follows: If i is the number of the pixel p 

and k an integer, then these pixels are in the list Lq at the subject to the constraint 
positions Po[k], for k between In[i] and In[i+1}-1. 

Construction of the Super-Frame (820 of FIG. 8). i(u-G*U) 2 $o„ 

The collection of data step produces a list of pixels with 
non-integer locations. From this data we have to construct an ^ minimization is performed by solving the Euler- 
image by attributing a grey-level value to pixels that have an 15 Lagrange equations corresponding to the energy and con- 
integer location. straint in accordance with conventional techniques. 

Step 0: Rescaling. Rescaling is performed if the positions Advantages of the Combination of the Velocity Estimation 

of pixels on the List are in the scale of a frame of the Process and the Frame Fusion Process 

sequence, or, in other words, the super-frame corre- There are two principal reasons that the velocity estima- 

sponds to a zoom version of this frame. We therefore 20 tion process described earlier in this specification provides 

have to re-scale the position of the pixels of the list Lq, great advantage when its output is used in the frame fusion 

to positions on the super-frame. We multiply by zoom^, process. First, the possible errors of matching have little 

zoom y the location in x, y, respectively of the pixels of effect on the quality of the super-frame. Second, the method 

the list Lq. of determining the occlusions is compatible with the func- 

We process a loop on the pixel of the super-frame: p. 25 tion of frame fusion. The errors of matching attributable to 

Step 1: Estimation of the density around p. The pixel p of the choice of the eventual parameters, noise or any pertur- 

the super-frame is contained in a pixel p' of the chosen bation do not create artifacts in the super-frame, 

frame. We consider the neighbourhood of p* made by its The possible errors of matching have virtually no effect on 

connected pixels: p\, . . . p' 9 . We define the density the super-frame. Any matching procedure can make errors, 

around the pixel p as 30 And errors in the velocity estimate can produce artifacts in 

the frame fusion superframe. A wrong estimate of velocity 

d ^ p) = Zi Yj expt-^JLotPoLr]] ■** p • x)) can cause a merging of different objects into one. Since it is 

*=us9/=f»ij£]j<Ai[/>£+i] impossible to guarantee that any algorithm that estimates 

velocity is error-free, it is advantageous to reduce or elimi- 
35 nate the effects of such errors in the superframe. The velocity 

where dist(.,.) is a distance. It measures in the expression the estimation process described above in this specification 

distance between the position of the pixel p of the super- ensures that the results will be between the chosen-frame (in 

frame, and the position of the pixel Po[j] of the list Lq caS e of many velocity estimate errors) and the optimal 

(rescaled in step 0). If this density is small, that means that super-frame (in case of little or no velocity estimate errors), 

there is no pixel on the list Lq near the pixel p, and 40 Dependence of the results of the frames fusion with 

conversely if the density is large that means that there are respect to the scale parameter of the velocities estimates: 

many pixels that are near. Note that the density can not be Selecting the ideal scale, that is the scale that provides the 

equal to zero. bes t velocity estimate, is difficult. If the scale is too large, 

Step 2: Estimation of the grey-level of the pixel p. The then only the motion of large objects is retained, and the risk 

grey-level value is set by the following formula: 45 is that the process might merge distinct small objects into a 

single object, thereby introducing artifacts into the super- 

u(p) = £ £ Lo[Po[j]] • frame. On the other hand, if the scale is too small, which 

wjii9>=fet/ t ]^ft'[p'r 1 i corresponds to over-segmentation, all the moving objects 

are found, but the small size of the obtained regions does not 
50 necessarily provide a unique velocity estimate. Therefore 

uexpi-(disrm[Po[j\] x, p -x))er(d{p))) the chosen velocity might be different from the true velocity, 

but since it belongs to the set of all possible velocities, the 
regions will match to a region which is similar. In others 

where o(d(p)) is given by words, if a region has "N" different possible velocities, there 

a(d(p))-min(d, M) 55 exist in the other frame "N" similar regions. Each of the 

The grey-level value is estimated by using an interpola- possible velocities associates the region to one of the "N" 

tion between the grey-level of the neighbour pixels that are similar regions of the other frame. Therefore in terms of 

in the list Lq. The interpolation is weighted by the distance. fusion, if the chosen velocity is not the true one, the process 

The nearer a pixel, the greater its weight in the interpolation. fuses the region to one which is similar. The consequence is 

The weight decreases with respect to the distance in an 60 that there will be no improvement in image quality, but there 

exponential way which is parametrized by a. The quantity a will no degradation either. If no regions are merged, the 

is directly linked to the local density, so that if there exists result of the frame fusion will be the interpolation of the 

many neighbour pixels, the decrease of the weight will be chosen frame in the finer grid. In summary, a scale parameter 

large. which is too large may induce artifacts, while one that is too 

The Deblurring Step (840 of FIG. 8). 65 small only degrades the quality of the optimal super-frame 

The deblurring step has been discussed above in this towards the original quality of the chosen frame. As the scale 

specification and is required because the pixels of the increases from 0 to 00, the quality increases from the original 
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quality of the chosen frame up to a maximum quality (limit 
of over-segmentation) and then artifacts will appear (under- 
segmentation). As discussed previously in this specification, 
the scale parameter can be set to the variance of the noise, 
which is the preferred choice. 

The effect of the selection of the set velocity models: The 
result of the velocity estimation depends upon the choice of 
the set of velocity models. If the set of models is too small 
(it cannot be too large if there are no memory allocation 
constraints), then there is over-segmentation, causing a 
degradation of the quality towards the resolution of the 
chosen frame. But, no artifacts are introduced into the 
superframe. Moreover, the set of the translations is suffi- 
ciently large to provide superior results. 
Advantages Relating to the Occlusion Process. 

Determination of the occlusion pixels between the chosen 
reference frame and each frame in the sequence is preferable 
because the frame fusion process should be prevented from 
collecting pixel data of occluded pixels. In order to decide 
if a pixel or a region is or not occluded, conventional 
techniques rely upon the error of matching with the best 
velocity for each pixel. If the error is large (above some 
threshold), the pixel (or region) is classified as occluded. If 
the threshold is set too large, some occluded pixels will 
escape classification as such and therefore will be errone- 
ously included in the super-frame, thereby introducing an 
artifact. Conversely, if the threshold is too small, non- 
occluded pixels may be erroneously classified as occluded, 
so that there will be litde or no improvement in image 
quality. In order to realize an improvement in image quality 
in conventional techniques, there must be at least some 
relatively large errors. Otherwise, if all of the errors are 
small, there is little or no difference between the frames, and 
the result is a super-frame at least nearly identical to the 
chosen reference frame except for the zoom factor. This 
problem is solved in the present invention by determining 
occluded pixels using a criteria independent of the error of 35 
matching. As described above in this specification, the 
occlusion process of the invention defines occlusions in a 
very different way. Specifically, occlusions are defined by 
finding pixels for which no match is found between the 
chosen frame and a given frame in the sequence. Thus, 
occlusions are defined in the process of the invention by the 
matching process and not by any threshold based on the 
errors of matching. 
Summary of a Preferred Process: 

A preferred embodiment of the invention is process for 
obtaining information from at least two image frames of a 
sequence of frames, each of the frames including an array of 
pixels, each pixel having an amplitude, one of the two 
frames being designated as a reference frame and the other 
being a non-reference frame, the process including the 
following features which will be enumerated with reference 
to applicable reference numerals in the drawings of FIGS. 
1-9: 

(1) defining a set of velocities with which the motion of 
pixels between the two frames may be modeled; 

dividing each one of the two frames into plural regions 
(block 200 of FIG. 2); 

(2) determining an error for each one of at least some of 
the velocities by carrying out the following steps for 
each one of the regions and for each union of pairs of 
the regions: 

(A) mapping each pixel of the non-reference frame into 
the reference frame in accordance with the one 
velocity (block 300 of FIG. 3), 

(B) computing an error amount which is a function of 
a difference in pixel amplitude attributable to the 
mapping (block 325 of FIG. 3); 
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(C) designating a minimum one of the error amounts 
computed for the velocities as the error for the one 
velocity, whereby a respective error is associated 
with each of the regions and with each union of pairs 
of the regions without regard to velocity (blocks 340 
and 345 of FIG. 3); and 
(3) merging qualified ones of the regions by the following 

steps: 

(A) computing for each pair of regions a merging scale 
which depends upon a gain including a function of 

(a) the sum of the errors of each pair of regions and 

(b) the error of the union of the pair of regions (block 
440 of FIG, 4); 

(B) merging each pair of the regions for which the 
merging scale meets a predetermined criteria (block 
240 of FIG. 2). 

The merging scale preferably depends also upon a cost 
including a function of (a) the sum of the lengths of the 
boundaries of each pair of regions and (b) the length of the 
boundary of the union of the pair of regions. 

The step of determining an error for each one of at least 
some of the velocities can include determining an error for 
each one of all of the velocities. 
The process can further include, after the step of merging: 
erasing the individual pairs of regions which have been 
merged and defining their unions as individual regions 
(block 420 of FIG. 4); and 
repeating the steps of (a) computing an error for each one 
of at least some of the velocities, (b) computing a 
merging scale and merging each pair of regions for 
which the merging scale meets the criteria (block 440 
of FIG. 4), whereby the process includes plural repeti- 
tive iterations. 

Preferably the step of determining an error for each one of 
at least some of the velocities includes determining the error 
for a limited set of the velocities, the limited set of the 
velocities corresponding to those velocities associated with 
the N smallest errors computed during a prior iteration of the 
process, wherein N is an integer (block 355 of FIG. 3). 

If each limited set of velocites associated with the N 
smallest errors is different for different regions, then the step 
of determining an error includes: 

designating as the maximum error for a given region the 
largest error computed for that region in any prior 
iteration of the process (block 360 of FIG. 3); 
and the step of computing the merging scale includes 
determining for each pair of regions whether a velocity 
included in the limited velocity set of one of the regions 
is not included in the limited velocity set of the other of 
the pair of regions, and assigning as the corresponding 
error for the other region the maximum error. 
The mapping includes computing a new pixel amplitude 
in accordance with a weighted average of pixel amplitudes 
mapped into the reference frame, wherein the weight of each 
pixel amplitude mapped into the reference frame is a 
decreasing function of the mapped pixel's distance from a 
given pixel location in the reference frame. 

The mapping step of mapping pixels from the non- 
reference frame to the reference frame is a forward mapping, 
and the process can further include determining which ones 
of the pixels are occluded by carrying out the following 
steps: 

(I) determining which pixels were not matched from the 
non-reference frame to the reference frame by the 
merging step following the forward mapping step and 
removing the pixels not matched from the reference 
frame (blocks 610 and 620 of FIG. 6); 
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(II) performing the step of determining an error and the 
step of merging, except that the mapping step includes 
a backward mapping of mapping from the reference 
frame to the non-reference frame, the backward map- 
ping step employing a version of the reference frame in 
which the pixels not matched have been removed 
(block 630 and 640 of FIG. 6); 

(III) determining which pixels were not matched from the 
reference frame to the non-reference frame by the 
merging step following the backward mapping step and 
removing the pixels not matched from the non- 
reference frame (block 650 and 660 of FIG. 6); 

(IV) comparing the pixels remaining in the reference 
frame with the pixels remaining in the non-reference 
frame, and repeating steps I, II and III if there is a 
difference beyond a predetermined threshold. 

The process assigns a velocity to each remaining pixel in 
the non-reference frame and then adds the remaining pixels 
of the non-reference frame to the reference frame in accor- 
dance with the velocity assigned to each pixel of the 
non-reference frame to produce an enhanced frame. 

The process can further include deblurring the image of 
the enhanced frame to produce a super frame (block 840 of 
FIG. 8). 

The dividing step can initialize the regions so that each 
pixel is an individual region. 

The model velocities include at least one of: the set of 
translational velocities, the set of rotational velocities, or the 
set of zooms. 

Preferably, the unions of pairs of regions constitute unions 
of pairs of adjacent regions only. 

The merging scale is computed as a ratio obtained by 
dividing the cost by the gain, and wherein the predetermined 
criteria includes a maximum scalar value of the ratio above 
which merging is disallowed (blocks 510, 513 and 530 of 
FIG. 5). The scalar value is selected in a range between an 
upper limit at which the entire image is merged and a lower 
limit at which no pixels are merged. 

The process further includes defining the set of velocities 
as a simple set during the first one of the iterations of the 
process, and supplementing the set of velocities with addi- 
tional velocities as the size of the regions grows. Preferably, 
the simple set includes the set of translational velocities, and 
the additional velocities include the set of rotational veloci- 
ties and the set of zoom velocities. 

The reference and non-reference frames can he in a 
moving sequence of frames depicting an image having 
motion, the process further including designating one of the 
sequence of frames as the reference frame and successively 
designating others of the sequency of frames as the non- 
reference frame, and performing all of the foregoing steps 
for each one of the successive designations of the non- 
reference frame, whereby the superframe contains informa- 
tion from all the frames of the sequence. Furthermore, the 
process can further include designating successive ones of 
the sequence of frames as the reference frame and repeating 
all of the foregoing steps for each designation of the refer- 
ence frame so that a super frame is constructed for each one 
of the sequence of frames. 

The step of assigning a velocity to each remaining pixel 
can be carried out by selecting the velocity for the region of 
that pixel having the minimum error. 
Applications: 

The invention has a number of uses. First, the construc- 
tion of an enhanced image or "superframe" is achieved as 
described. Secondly, the process may be used to stabilize 
images of stationary objects. Third, the process may be 
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employed to interpolate frames to smooth out a slow motion 
sequence. Fourth, the superframe constructed by the inven- 
tion may be employed to convert between different video 
formats having different numbers of horizontal fines per 
frame. For example, in converting from the PAL video 
format to the NTSC video format, or from NTSC to high 
definition video, the number of horizontal lines must be 
increased, which can be accomplished by using the super- 
frame constructed by the invention. Fifth, the invention can 
be employed in video compression by providing superior 
predictions of pixel velocities between frames. Sixth, the 
velocity estimates provided by the invention can be 
employed in three-dimensional scanning as an improved 
measure of depth out of the image plane. Finally, the 
region-by-region velocity estimates can be employed for 
object removal to remove either all moving objects or 
remove all stationary objects. 

While the invention has been described in detail by 
specific reference to preferred embodiments, it is understood 
that variations and modifications thereof may be made 
without departing from the true spirit and scope of the 
invention. 

What is claimed is: 

1. A process for obtaining information from at least two 
• image frames, each of said frames comprising an array of 
pixels, each pixel having an amplitude, one of said two 
frames being designated as a reference frame and the other 
being a non-reference frame, said process comprising: 

(1) defining a set of velocities with which the motion of 

pixels between said two frames may be modeled; 
'(2) dividing each one of said two frames into plural 
\ regions; 

^(3) determining an error for each one of at least some of 
said velocities by carrying out the following steps for 

t each one of said regions and for each union of pairs of 
said regions: 

(A) mapping each pixel of said non-reference frame 
J - into said reference frame in accordance with the one 

velocity, 

(B) computing an error amount which is a function of 
a difference in pixel amplitude attributable to said 
mapping; ^ 

(C) designating a minimum one of the error amounts 
computed for said velocities as the error for said one 
velocity, whereby a respective error is associated 
with each of said regions and with each union of 
pairs of said regions without regard to velocity; and 

(4) merging qualified ones of said regions by the follow- 
ing steps: 

(A) computing for each pair of regions a merging scale 
which depends upon a gain comprising a function of 

(a) the sum of the errors of each pair of regions and 

(b) the error of the union of said pair of regions; 

(B) merging each pair of said regions for which said 
merging scale meets a predetermined criteria. 

2. The process of claim 1 wherein said merging scale 
further depends upon a cost comprising a function of (a) the 
sum of the lengths of the boundaries of each pair of regions 
and (b) the length of the boundary of the union of said pair 
of regions. 

3. The process of claim 1 wherein the step of determining 
an error for each one of at least some of said velocities 
comprises determining an error for each one of said veloci- 
ties. 

4. The process of claim 1 further comprising, after the step 
of merging: 

erasing the individual pairs of regions which have been 
merged and defining their unions as individual regions; 
and 
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repeating said steps of (a) computing an error for each one 10. The process of claim 9 further comprising debhirring 

of at least some of said velocities, (b) computing a the image of said enhanced frame to produce a super frame, 

merging scale and merging each pair of regions for 11. The process of claim 1 wherein said dividing step 

which the merging scale meets said criteria, whereby initializes said regions so that each pixel is an individual 

said process comprises plural repetitive iterations. 5 region. 

5. The process of claim 4 wherein the step of determining 12 ^ process of claim 1 wherein said velocities com- 
an error for each one of at least some of said velocities prise at least one of: the set of translational velocities, the set 
comprises determining said error for a limited set of said of 5 ota jj onal velocit f s > the set of zooms 

velocities, said limited set of said velocities corresponding 13 * ^ P rocess of claim } wherem s * d umons of P airs ° f 

to those velocities associated with the N smallest errors 10 ^egionsconstitute unions of pairs of adjacent regions only. 

. . ..... r -j i_ • kt 14. The process of claim 2 wherein said merging scale is 

computed dunng a prior iteration of said process, wherein N „ * j *■ l* ■ j u j- -j- -j * l. j 

r . A to r r > computed as a ratio obtained by dividing said cost by said 

, r x • * s • i*.-, r g^ri, and wherein said predetermined criteria comprises a 

6. The process of claim 5 wherem each limited set of maximum scaiu vallle of said ratio above which mel ^ h 
velocites associated with the N smallest errors is different for disallowed 

different regions, and wherein said step of determining an 15 15 ^ process of claim 14 wherein said scalar value is 

error comprises: selected in a range between an upper limit at which the entire 

designating as the maximum error for a given region the image is merged and a lower limit at which no pixels are 

largest error computed for that region in any prior merged. 

iteration of said process; 16. The process of claim 4 further comprising defining 

and wherein the step of computing said merging scale 20 said set of velocities as a simple set during the first one of 

comprises determining for each pair of regions whether said Orations of said process, and supplementing said set of 

a velocity included in said limited velocity set of one of velocities with additional velocities as the size of said 

said regions is not included in the limited velocity set regionsgrows. 

of the other of said pair of regions, and assigning as the 17 J** focess of claim 16 wherem said simple set 

corresponding error for said other region said maxi- * comprises the set of translational velocities, and said addi- 

mum error tional velocities comprise the set of rotational velocities and 

7. The process of claim 1 wherein said mapping com- the of 200131 vel ° cit j es - _ L . . , 

prises computing a new pixel amplitude in accordance with 18 ' ™ e P roc f s of f laim 10 wherein said reference and 

a weighted average of pixel amplitudes mapped into said non-reference frames he in a movmg sequence of frames 

reference frame, wherein the weight of each pixel amplitude 30 de P lctin g an ima S e ha ™g m0tl0n > a ° d sa "* P«>cess further 

mapped into said reference frame is a decreasing function of comprising designating one of said sequence of frames as 

the mapped pixel's distance from a given pixel location in said reference and successively designating others of 

said reference frame s sequence of frames as said non-reference frame, and 

8. Hie process of claim 1 wherein said mapping step of Performing all of the foregoing steps for each one of the 
mapping pixels from said non-reference frame to said ref- 35 successive designations of said non-reference frame, 
erence frame is a forward mapping, said process farther whereby said superframe contains information from all the 
comprising determining which ones of said pixels are frames of said sequence. .... 
occluded by carrying out the following steps: . 19 ™ e P rocess of d f m * 8 ^ nhQT comprising designat- 

zrv j . u-u-i * *uj* - j ing successive ones of said sequence of frames as said 

(I) determining which pixels were not matched from said ? c , 1t n r ., r c 

w c & r *\ .jr r , 40 reference frame and repeating all of said foregoing steps for 

non-reference frame to said reference frame by said - r ^J c . 

c u ■ -j c j • / j each designation of said reference frame so that a super 

merging step following said torward mappmg step and f ■ * * jr u * -j cc 

to r -j - , » 4 . ,r c frame is constructed for each one of said sequence of frames, 

removmg said pixels not matched from said reference ~ A r™ c , . n , . 4l _ \ - . . 

frame- e P rocess °* c ^ aim " wherem the step of assigning 

' . . a velocity to each remaining pixel comprises selecting the 

01) performing said step of determining an error and said 45 vdocity for the regk)n of ^ pixd having (he minimum 

step of merging, except that said mapping step com- erTor 

prises a backward mapping of mapping from said 21 A method of processing at least two frames 

reference frame to said non-reference frame said back- each ^visible into similar sets of regions, wherein one of 

ward mapping step employing a version of said refer- said frames & a reference frame and the other ^ a non . 

ence frame in which said pixels not matched have been 5Q re f er ence frame, said method of processing comprising: 

^T^ n l OVe ' • • ■ t ' r determining an error based upon discrepancy in pixel 

(III) determining which pixels were not matched from amplitude for each one of at least some of a set of 
said reference frame to said non-reference frame by velocities for modeling pixel motion between frames 
said merging step foUowing said backward mapping for each one of ^ regkms and for each m[on of ks 
step and removmg said pixels not matched from said 55 of ^ regions? ^ wherein ^ delerminmg an error 
non-reference frame; farther comprises: 

(IV) comparing the pixels remaining in said reference (a) mapping each pixel of said non-reference frame 
frame with the pixels remaining in said non-reference i nto ^ reference frame in accordance with the one 
frame, and repeating steps I, II and III if there is a velocity, 

difference beyond a predetermined threshold. 60 (B) computing an error amount which is a function of 

9. The process of claim 8 further comprising: a difference in pixel amplitude attributable to said 
assigning a velocity to each remaining pixel in said mapping; 

non -reference frame; (Q designating a minimum one of the error amounts 

adding the remaining pixels of said non-reference frame computed for said velocities as the error for said one 

to said reference frame in accordance with the velocity 65 velocity, whereby a respective error is associated 

assigned to each pixel of said non-reference frame to with each of said regions and with each union of 

produce an enhanced frame. pairs of said regions without regard to velocity; and, 
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merging a pair of said regions whose union tends to have 
a smaller function of said error than the sum of similar 
functions of the separate regions of the pair. 

22. A method of processing at least two image frames 
each divisible into similar sets of regions, comprising: 

determining an error based upon discrepancy in pixel 
amplitude for each one of at least some of a set of 
velocities for modeling pixel motion between frames 
for each one of said regions and for each union of pairs 
of said regions; and, 

merging a pair of said regions whose union tends to have 
a smaller function of said error that the sum of similar 
functions of the separate regions of the pair, and 
wherein said merging further comprises: 

(A) computing for each pair of regions a merging scale 
which depends upon a gain comprising a function of 

(a) the sum of the errors of each pair of regions and 

(b) the error of the union of said pair of regions; 

(B) merging each pair of said regions for which said 
merging scale meets a predetermined criteria. 

23. A method of processing at least two image frames 
each divisible into similar sets of regions, comprising: 

determining an error based upon discrepancy in pixel 
amplitude for each one of at least some of a set of 
velocities for modeling pixel motion between frames 
for each one of said regions and for each union of pairs 
of said regions, and wherein the step of determining an 
error further includes: 

a forward mapping step of mapping pixels from a 
non-reference one of said frames to a reference one 



10 



15 



25 



of said frames, said process further comprising deter- 
mining which ones of said pixels are occluded by 
carrying out the following steps: 

(I) determining which pixels were not matched from 
said non-reference frame to said reference frame 
by said merging step following said forward map- 
ping step and removing said pixels not matched 
from said reference frame; 

(II) performing said step of determining an error and 
said step of merging, except that said mapping 
step comprises a backward mapping of mapping 
from said reference frame to said non-reference 
frame, said backward mapping step employing a 
version of said reference frame in which said 
pixels not matched have been removed; 

(III) determining which pixels were not matched 
from said reference frame to said non-reference 
frame by said merging step following said back- 
ward mapping step and removing said pixels not 
matched from said non-reference frame; 

(IV) comparing the pixels remaining in said refer- 
ence frame with the pixels remaining in said 
non-reference frame, and repeating steps I, II and 
III if there is a difference beyond a predetermined 
threshold; and 

merging a pair of said regions whose union tends to have 
a smaller function of said error than the sum of similar 
functions of the separate regions of the pair. 
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